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Abstract 

The primate a-/0-defensin multigene family encodes versatile endogenous cationic and amphipathic peptides that have 
broad-spectrum antibacterial, antifungal and antiviral activity. Although previous studies have reported that a-/0-defensin 
(DEFA/DEFT) genes are under birth-and-death evolution with frequent duplication and rapid evolution, the phylogenetic 
relationships of the primate DEFA/DEFT genes; the genetic bases for the existence of similar antimicrobial spectra among 
closely related species; and the evolutionary processes involved in the emergence of cyclic 6-defensins in Old World 
monkeys and their subsequent loss of function in humans, chimpanzees and gorillas require further investigation. In this 
study, the DEFA/DEFT gene repertoires from primate and treeshrew were collected, followed by detailed phylogenetic, 
sequence and structure, selection pressure and comparative genomics analyses. All treeshrew, prosimian and simian DEFAI 
DEFT genes are grouped into two major clades, which are tissue-specific for enteric and myeloid defensins in simians. The 
simian enteric and myeloid a-defensins are classified into six functional gene clusters with diverged sequences, variable 
structures, altered functional constraints and different selection pressures, which likely reflect the antimicrobial spectra 
among closely related species. Species-specific duplication or pseudogenization within each simian cluster implies that the 
antimicrobial spectrum is ever-shifting, most likely challenged by the ever-changing pathogen environment. The DEFT 
evolved from the myeloid DEFA8. The prosegment of 6-defensin is detected with adaptive changes coevolving with the new 
protein fold of mature peptide, coincident with the importance of the prosegment for the correct folding of the mature 
peptide. Lastly, a less-is-hitchhiking hypothesis was proposed as a possible explanation for the expansion of pseudogene 
DEFTP and the loss of functional DEFT, where the gain or loss of the hitchhiker is determined by its adjacent driver gene 
during the birth-and-death evolutionary process. 
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Introduction 

Host immune systems evolve various defensive weapons against 
pathogens during the pathogen-host arms race. Defensins are 
small antimicrobial peptides that act as both effectors and 
mediators of host immunity. These endogenous cationic and 
amphipathic peptides have broad-spectrum antibacterial, antifun- 
gal and antiviral activity [1—6]. They also modulate the innate and 
adaptive immune systems by promoting or suppressing the 
proinflammatory responses during microbial infection [7-9]. In 
mammals, there are three structural subfamilies of defensins — a-, 
(3- and 6-defensins — that differ by their tridisulfide motifs. The (3- 
defensfns are the oldest members and are found in most classes of 
vertebrates [10-13]. The a-defensins, which originate from the (3- 
defensfns, are younger members. The a-defensins are found in 
marsupials and most mammals but were lost in some Laura- 
siatheria species, such as cattle and dogs [14—17]. The 9-defensins 
are the only mammalian cyclic peptides currently known and 



originate from a-defensins through a nonsense mutation in the 
mature peptide. The 0-defensins are found in Old World monkeys 
and certain apes but underwent a loss of function in humans, 
chimpanzees and gorillas due to a nonsense mutation in the signal 
peptide [18,19]. 

Primate a-/0-defensin (DEFA/DEFT) genes are located as 
multigene clusters on chromosome regions that are homologous 
with human chromosome 8p23 [16,20]. Like other multigene 
families, DEFA/DEFT genes can be constitutively expressed at 
high levels to produce variant functional proteins. In humans, 
there are six functional and tissue-specific a-defensin peptides. 
Human HNP1-HNP4, encoded by the DEFAI, DEFA3 and 
DEFA4 genes, are primarily expressed in neutrophils [1]. The 
DEFAI and DEFA3 genes are genetic variants that encode proteins 
with a single amino acid difference and are also referred to as 
DEFA1A3 genes. Unlike the single-copy DEFA4, the DEFA1A3 
genes have copy number polymorphisms [21-23]. In contrast, 
human HD5 and HD6, encoded by DEFA5 and DEFA6, 
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respectively, are expressed primarily in Paneth cells of the small 
intestine and play important roles in intestinal host defense and 
homeostasis [24—26]. Apart from the functional genes, there are 
many defensin pseudogenes — DEFAP and DEFTP — in humans 
[19,27]. In macaques, the new member 9-defensin encoded by 
DEFT is primarily expressed in the bone marrow and leukocytes 
[28]. 

Defensins are synthesized as pre-pro-defensins containing a 
signal peptide, a prosegment and a mature peptide. The 
prosegment, which serves as an intramolecular chaperone, assists 
in the correct disulfide pairing and proper folding of the mature 
peptide [29] and keeps the mature peptide inactive until it is 
cleaved by various proteolytic enzymes [30-32]. The mature 
peptides are cationic and amphipathic, which are important 
properties for inducing the depolarization and permeabilization of 
the microbial membrane [2,33]. The a-defensin monomer has a 
three-stranded antiparallel P-sheet structure with three intramo- 
lecular disulfide pairs linked as Cysl-Cys6, Cys2-Cys4 and Cys3- 
Cys5. Two monomers form an amphipathic dimer, which is 
stabilized by hydrophobic interactions and intermolecular hydro- 
gen bonds between residues 18 and 20 (HNP4 numbering) in the 
second P-sheet [33,34]. The dimerization of a-defensins, in 
addition to their cationic and amphipathic character, is also 
important for their antimicrobial ability [35-37]. In contrast to the 
structure of a-defensins, the 9-defensins form a cyclic octadeca- 
peptide through the posttranslational head-to-tail ligation of two 
nonapep tides and harbor three intermolecular disulfide pairs [18]. 
Recently, synthetic defensins have been studied and are being 
developed as potential antimicrobial peptide drugs [38-41]. 

Because of the frequent duplication and rapid evolution of 
primate a-/9-defensins, the nomenclature and phylogenetic 
relationships among this multigene family are still ambiguous. 
Moreover, there is no clear phylogenetic classification related to 
the expression pattern or the confounding antimicrobial function 
of these a-defensins, although many functional studies indicate 
that a-defensins are effective microbicidal peptides against a wide 
variety of microorganisms. Previous studies have demonstrated 
that the a-/9-defensin multigene family, like many other 
multigene families, is subject to birth-and-death evolutionary 
process with frequent gene duplication, pseudogenization and 
significant positive selection [42-44]. However, the molecular 
evolution of the undocumented antimicrobial spectra that are 
composed of functionally divergent a-/9-defensins in humans and 
closely related primates should be further explored. In this study, 
the phylogenetic classification, sequence divergence and structural 
diversification of the primate a-/9-defensins were investigated 
using molecular evolution and molecular dynamics analyses. 
Furthermore, the evolutionary processes involved in the emer- 
gence of cyclic 9-defensins and their subsequent loss of function in 
humans, chimpanzees and gorillas require investigation. Loss of 
function is a major driving force for phenotypic change and can be 
advantageous, deleterious or tolerated, as explained by the 
hypotheses of less-is-more, less-is-less and less-is-nothing, respec- 
tively [45]. Because of the frequent gene duplication and 
functional redundancy within multigene families, losing certain 
members is usually "tolerable". However, the 9-defensins have 
significant antiviral activity that is higher than that of a-defensins 
and P-defensins, including anti-HIV-1 activity [39,46]. The loss of 
functional 9-defensin seems to be "deleterious". Therefore, the 
expansion of pseudogene DEFTP and the loss of functional DEFT 
in humans, chimpanzees and gorillas cannot be well explained by 
the three hypotheses noted above. The evolutionary fate of 9- 
defensin genes may be influenced by other factors, which should 
be further investigated. In this study, a less-is-hitchhiking 



hypothesis was proposed to explain this process, using comparative 
genomics analyses. 

Materials and Methods 

Sequence identification 

To identify sequences for the primate a-/9-defensin gene 
family, a literature search and exhaustive genome blast searches 
were carried out. First, primate DEFAI DEFT sequences were 
collected from the literature and defined as queries. Second, 
sequence-based searches of both BlastN and BLAT were 
performed using these queries, and these BlastN and BLAT 
searches were repeated using the newly identified genes as queries 
until no additional genes could be identified. Last, all identified 
genes were manually checked for functional genes or pseudogenes. 
Primate DEFAI DEFT DNA sequences or completely assembled 
genomes from the NCBI database or the UCSG Genome Browser 
were used, including human (Homo sapiens; assembly: GRCh37.p5; 
March 2011), chimpanzee (Pan troglodytes; assembly: Pan_troglo- 
dytes-2.1.4; March 2011), gorilla (Gorilla gorilla; assembly: gor- 
GOR3.1; October 2011), Sumatran orangutan (Pongo abelii; 
assembly: AC206038.3; August 2008), northern white-cheeked 
gibbon (Momascus leucogenys; assembly: Nleul.0; October 2011), 
siamang (Symphalangus syndactylus; sequences: AY128121.1 and 
AY128122.1), macaque (Macaca mulatta; assembly: rheMac2; 
January 2006), crab-eating macaque (Macaca fascicularis; assem- 
blies: AEHL01390702.1 and CAEC0161 3298.1; October 2011), 
olive baboon (Papio anubis; assembly: AC 116559.30; September 
2010), pig-tailed macaque (Macaca nemestrina; AY128123. 1), eastern 
black-and-white colobus (Colobus guereza kikuyuensis; AY128124), 
marmoset (Callithrix jacchus; assembly: caljac3; June 2007), Bolivian 
squirrel monkey (Saimiri boliviensis boliviensis; assembly: saiBoll; 
November 2011), gray mouse lemur (Microcebus murinus; assembly: 
ABDG0 1305 195.1; July 2007), bush baby (Otolemur garnettii; 
assembly: AAQR03 188273.1; March 201 1), and Philippine tarsier 
(Tarsius syrichta; assembly: ABRT010372935.1; September 2008). 
The treeshrews (northern treeshrew Tupaia belangeri; assembly: 
AAPY0 1804345.1; June 2006; Chinese treeshrew Tupaia belangeri 
chinensis; assembly: ALAR 1000000; February 2013), which are 
close relatives of primates, were also used. The exon-intron 
structures were determined by GeneWise. Sequences that encoded 
a-/9-defensins containing the six conserved cysteines (for DEFA) 
and the three conserved cysteines (for DEFT) were identified as 
functional genes. Sequences with frame-shift (insertion or deletion) 
mutations or premature stop codons were considered pseudo- 
genes. A list of identified sequences is detailed in Table SI. 

Molecular evolutionary analyses 

To understand the evolutionary processes of the primate a-/9- 
defensin gene family, molecular evolutionary analyses and 
comparative genomics analyses were performed. Coding regions 
of the DEFA/ DEFT functional genes were aligned codon-to-codon 
using the MUSCLE program as implemented in the MEGA6 
software [47] , and then corresponding regions of the pseudogenes 
were manually matched to the multiple sequence alignment of the 
functional genes. 

Phylogenetic trees were built using either the more conserved 
signal-prosegment region or the entire coding region. To obtain 
reliable phylogenetic relationships, three different tree-building 
methods were combined. Bayesian inference (BI) phylogenetic 
trees were constructed using the BEAST software [48] . Neighbor- 
joining (NJ) and maximum likelihood (ML) phylogenetic trees 
were constructed using the MEGA6 software [47]. The BI trees 
were computed using the general time-reversible (GTR) nucleotide 
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substitution model, and the site heterogeneity model was assumed 
with a discrete gamma distribution for among-site rate variation 
and a proportion of invariant sites (G + I (4 categories)). Two 
independent Markov chain Monte Carlo (MCMC) runs were 
conducted for 10,000,000 steps, with a sampling frequency of 
every 1000 steps, a UPGMA starting tree and other default 
parameters. When the effective sample sizes (ESSs) of all quantities 
were larger than 200, the MCMC was considered converged, and 
the results were analyzed with 20% burn-in. The best nucleotide 
substitution patterns for ML trees were selected based on the 
analyses of best-fit models in MEGA6. The ML trees were 
computed using the Kimura 2-parameter substitution model (K2) 
with a bootstrap test (1000 replicates), and the rate variation model 
was allowed with a discrete gamma distribution (4 categories). The 
evolutionary distances for NJ trees were computed using the 
Kimura 2-parameter method with a bootstrap test (1000 
replicates), and the rate variation among sites was modeled with 
a gamma distribution (gamma parameter =2.6 for the signal- 
prosegment region and 1.4 for the entire coding region). 

The phylogenetic trees of BI, NJ and ML were combined using 
the TreeGraph2 software [49]. The BI trees were selected as 
background trees, and the clades or clusters of similar topologies in 
the BI, NJ and ML trees were labeled with posterior probabilities 
(BI) and bootstrap support values (NJ and ML). 

The DEFA/DEFT genes from the human, chimpanzee, 
orangutan, macaque and marmoset genomes were used for 
synteny analysis by mapping their loci onto chromosomes in scale. 

Functional divergences (type I and type II) between clusters 
were tested using the Gu99 and Gu2013 probabilistic models 
implemented in the DIVERGE 2.0 software [50,51]. Pairwise 
coefficients (9 j— SE) and likelihood ratio statistics (LRT) for simian 
enteric or myeloid defensin clusters were analyzed using the 
bootstrapped Gu99 probabilistic method. Functional distance 
analysis was performed based on the Wang and Gu 200 1 method 
[52]. The type I functional distance (rfp) between clusters was 
defined as di = — ln(l — Q s ). For two clusters A and B, 
</f(A,B) = £>f(A)+#f(B); thus, the functional branch length, bp, for 
each cluster can be calculated using a standard least-squares 
method. A large bi value for a gene cluster indicates that the 
evolutionary conservation may be shifted at many sites. The 
effective number of sites (n e ) related to functional divergences was 
estimated following the Gu2013 protocol. Type I sites, typically 
generated by type I functional divergence, are conserved in one 
gene cluster but highly variable in the other following gene 
duplication, whereas type II sites, which are primarily caused by 
type II functional divergence, refer to radical changes of conserved 
amino acids at the same site in both gene clusters. 

To investigate the selection pressure, maximum likelihood 
estimations of positive selection were performed using the site- 
specific models from the CODEML program implemented in the 
PAML 4.5 package [53]. Parameters for the models M0 (one 
ratio), Mia (neutral), M2a (selection), M7 (beta), M8 (beta and co 
(equivalent to Ka/Ks)) and M8a (beta and C0= 1) were calculated. 
The M0 model assumes a uniform selective pressure among sites. 
The Mia model assumes a variable selective pressure but no 
positive selection. The M2a model assumes a variable selective 
pressure with positive selection. The M7 model assumes a beta- 
distributed variable selective pressure. The M8 model assumes a 
beta-distributed variable selective pressure plus positive selection. 
The M8a assumes a beta distributed variable selective pressure 
without positive selection. Three likelihood ratio tests (LRTs) were 
compared using the following paired models: Mla-M2a, M7-M8 
and M8a-M8. 



To investigate changes in the selection pressure along the DEFA 
and DEFT sequences, amino acid identities and average pairwise 
Ka/Ks ratios were computed using a sliding window method with 
a window size of 10 residues and a step size of 5 residues. Within 
each window, the numbers of nonsynonymous and synonymous 
substitutions per site (denoted as Ka and Ks, respectively) were 
calculated using the Nei and Gojobori (1986) method incorporated 
into the KaKs_Calculator software [54]. The values of Ka/Ks 
were reanalyzed using a bootstrap test of 1 000 replicates to assess 
the effect of potential biases from a few genes in the gene set and to 
test the statistical significance of the difference between the 
average Ka/Ks ratio and 1. Suppose there were N original 
sequences, the average Ka/Ks from a random resampling of N 
sequences was calculated, and the resampling process was 
repeated 1000 times. The histogram for the bootstrap estimates 
of average Ka/Ks showed a normal distribution, and the 
parameters of the normal distribution were calculated. The null 
hypothesis (HO) was that there was no difference between Ka/Ks 
and 1, and the alternative hypothesis (HI) was that there was a 
difference between Ka/Ks and 1. If the average Ka/Ks for the 
original sequences was larger than 1, the HO was defined as "Ka/ 
Ks^l" and the HI was defined as "Ka/Ks>l." If the average 
Ka/Ks for the original sequences was smaller than 1, the HO was 
defined as "Ka/Ks^l" and the HI was defined as "Ka/Ks<l." 
The p- value was calculated based on the parameters of the normal 
distribution. 

Sequence logos were generated from the web server WebLogo 
[55]. 

Molecular dynamics simulations 

Structures of the a- and 9-defensins were searched using BlastP 
against the Protein Data Bank (PDB) database. One representative 
structure for each simian a-defensin cluster was analyzed, 
including human HNP1 (PDB code: 3GNY) [56] representing 
the DEFA1 cluster, human HNP4 (PDB code: 1ZMM) for the 
DEFA4 cluster, human HD5 (PDB code: 1ZMP) for the DEFA5 
cluster and human HD6 (PDB code: 1ZMQ) for the DEFA6 
cluster [34] . No native structures were solved for the DEFA8 and 
DEFA9 clusters; thus, homology models of the rhesus macaque 
DEFA8 and marmoset DEFA9a were built using the MODELER 
automodel method [57] implemented in the Discovery Studio 3.1 
software. The corresponding template structures selected for this 
modeling were HNP1 (41.9% sequence identity with DEFA8) and 
HD5 (39.4% sequence identity with DEFA9). Monomers were 
aligned into dimers by superimposing four intermolecular 
hydrogen bonds between the backbone residues 18 and 20 
(HNP4 numbering) through a customized TCL script running in 
the VMD 1.9 software [58]. 

Molecular dynamics (MD) simulations were performed follow- 
ing the method described below. The simulation system was 
parameterized by applying the standard AMBER force field 
ff99SB for bio-organic systems using the leap module implemented 
in the AmberTools 12 package [59]. The preparative processes 
included adding hydrogen atoms and disulfide bonds, neutralizing 
with counter ions (C1-) and solvating in a periodic box with TIP3P 
water to at least a 10-A distance around the protein. Molecular 
dynamics simulations were then performed using the NAMD 2.7 
software [60] . The entire simulation process was composed of the 
following steps: minimization, heating, equilibration and produc- 
tion. Three rounds of energy minimization were performed by 
releasing the restraints in a stepwise fashion. In the first round, all 
protein atoms were constrained for 50,000 steps with 2.0 kcal/ 
(mol'A 2 ) restraints. In the second round, only atoms of the 
backbone were restrained for 50,000 steps with 2.0 kcal/(mol»A 2 ) 
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restraints. In the third round, all atoms were relaxed for 100,000 
steps. Short-range electrostatic and Van der Waals interactions 
were truncated at 9 A, whereas long-range electrostatic forces were 
computed using the particle-mesh Ewald (PME) summation 
method. Next, the system was gently annealed from 0 to 310 K 
using a Langevin thermostat with a coupling coefficient of 
5.0 ps -1 and employing a force constant with 2.0 kcal/(mol»A 2 ) 
restraints. All subsequent steps were carried out in the isobaric- 
isothermal NPT ensemble with a target pressure of 1 atm. Five 
rounds of equilibration (100 ps, each at 310 K) were performed 
with decreasing restraint weights from 2.0, 1.5, 1.0, 0.5 to 
0.0 kcal/(mol»A 2 ). By releasing all the restraints, the system was 
again equilibrated for 500 ps. After the final equilibration step, the 
production phase of MD was run without any restraints for a total 
of 16 ns. 

Tertiary structures and electronic properties were analyzed after 
performing MD simulations. The flexibility of the dimer was 
evaluated using two distances and a dihedral angle involving four 
highly flexible residues. The distance between the C a atoms of 
residue 22 in each dimer (HNP4 numbering) was defined as 
A22C ot -B22C ot . The distance between the C a atoms of residue 1 1 
in each dimer was defined as Al lC a -Bl lO*. The dihedral angle 
was defined as Al lC a -A22C ot -B22C a -Bl lC a . Distances and 
dihedral angles were extracted using a customized TCL script 
program. 

The electrostatic potential was mapped by averaging the last 
five frames of each MD trajectory using the APBS plugin [61] 
implemented in the VMD software. 

Results 

Identification, nomenclature and phylogeny of the 
primate a-/8-defensin multigene family 

The primate ot-/9-defensin gene repertoires, which were 
identified from the published literature [19,42,62] and exhaustive 
genomic BlastN and BLAT searches, were manually checked. A 
total of 144 DEFA /DEFT sequences from primates were identified, 
including 105 functional genes and 39 pseudogenes. Moreover, 10 
DEFA sequences from two species of treeshrews were collected and 
served as closely related outgroups for the primate sequences 
(Table SI). Almost one-third of the primate ot-/6-defensin genes 
have lost their functional protein-coding ability because of 
nonsense or frame-shift mutations that have occurred during the 
process of rapid birth-and-death evolution. The translated amino 
acid sequences of functional genes were aligned with MUSCLE 
(Figure SI). Most of the protein sequences are computationally 
translated from genome sequences rather than being demonstrated 
to be produced in vivo. The multiple sequence alignment shows 
that the mature peptide of ot-defensins has a motif with six 
conserved cysteine residues, represented as C-x-C-x(3,4)-C-x(9)-C- 
x(6,9)-C, whereas the mature peptide of 9-defensins has a 
nonapeptide motif with three conserved cysteine residues, 
represented as x-C-x-C-x(4)-C, with the exception of the Eastern 
black-and-white colobus 9-defensin (cgue_DEFT), which has the 
motif x-C-x-C-x(8)-C. 

To explore the evolutionary relationships among these defensin 
genes, phylogenetic trees were constructed based on nucleotide 
sequences from either the more conserved signal-prosegment 
region or the entire coding region, and the phylogenetic trees 
inferred by the BI, NJ and ML methods were combined (Figure 1). 
All sequences are classified into major clades, which are referred to 
as the treeshrew, prosimian and simian clades based on the species 
included. In all the trees from the signal-prosegment region 
(Figures 1A, S2 and S3), the simian defensins are classified into two 



major clades of tissue-specific expression profiles and are thus 
named simian myeloid ot-defensins and simian enteric ot-defensins. 
In all the trees from the entire coding region (Figures IB, S4 and 

55) , all sequences are separated into two groups, with group I 
containing the simian enteric ot-defensins and the treeshrew clade 
2 and group II containing the simian myeloid ot-defensins and the 
treeshrew clade 1, which indicates a possible early duplication and 
divergence before the split of treeshrews and primates. However, 
for the prosimian clades, phylogenetic incongruence was observed. 
The tree inferred from the signal-prosegment region has two 
major prosimian clades (Figure 1 A), whereas the tree inferred from 
the entire coding region divides the prosimian clade 2 into four 
groups (Figure IB). Because incongruence between phylogenetic 
trees may result from the independence of evolutionary changes at 
different sites and the homogeneity of the substitution process [63], 
the phylogenetic incongruence was further assessed by recon- 
structing the phylogenetic tree after removing the homogeneity 
sites. Some amino acid sites in the mature peptide were identified 
as homogeneity sites of convergent or parallel evolution (Figure 

56) . When those sites were removed, the prosimian ot-defensins 
grouped together, suggesting that these convergent or parallel 
evolution sites have resulted in long-branch attraction such that 
distant clades grouped together. 

The simian myeloid and enteric ot-defensins are classified into 
seven clusters, which are named following the nomenclature used 
for human DEFA/ 'DEFT genes. The simian myeloid ot-defensins 
contain the DEFA1/DEFA3, DEFA8/DEFA10/DEFT, DEFA4 and 
DEFA7 clusters, and the simian enteric ot-defensins contain the 
DEFA9 / DEFA1 1 , DEFA6 and DEFA5 clusters. The DEFA7 cluster 
contains only pseudogenes and is found only in Old World 
monkeys and hominoids. The other six gene clusters, each 
containing at least one functional gene, are functional gene 
clusters. All six functional gene clusters include species from New 
World monkeys, Old World monkeys and hominoids, suggesting 
that the duplication and diversification of these clusters occurred 
before the species differentiation of simians. The six functional 
gene clusters, which diverged prior to the simian differentiation, 
provide the genetic bases for the similar antimicrobial spectra in 
closely related simian species. 

The DEFA9 / DEFA1 1 cluster shows different topologies among 
the phylogenetic trees (Figures S2 and S4). In the NJ and ML trees 
from the signal-prosegment region, the pseudogene DEFA11P and 
the DEFA9 from Old World monkeys and hominoids group into 
one cluster, and the DEFA9 from New Old monkeys is the 
outgroup. However, in the BI tree from signal-prosegment region 
and the BI/NJ/ML trees from the entire coding region, DEFA11P 
and DEFA9 form separate clusters. To further clarify the 
relationship between DEFA9 and DEFA11P, the sequence 
alignment from the complete gene including the intron was 
examined, and different patterns of sequence similarity were 
observed before and after a clear boundary in the genes. 
Therefore, phylogenetic trees were inferred based on either the 
sequences before the boundary (part I) or the sequences after the 
boundary (part II), using DEFA6 as the outgroup (Figure S7). The 
trees based on part I show that DEFA11P is duplicated from 
DEFA9 after the split of New World and Old World monkeys, 
whereas the trees based on part II show that DEFA11P is the 
outgroup of all DEFA9. The different topologies indicate that the 
two parts of DEFA11P might be derived from different ancestors. 
The DEFA8 from Old World monkeys and hominoids also shows 
different phylogenetic positions among the phylogenetic trees 
(Figures S2 and S4). In the NJ tree from the signal-prosegment 
region, the DEFA8 from Old World monkeys and hominoids 
clusters together with DEFA1 0 1 DEFT. However, in all the other 
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Figure 1. Phylogenetic trees of the a-/6-defensin (DEFAl DEFT) genes in primates and treeshrews. A: Phylogenetic tree of primate and 
treeshrew DEFA/DEFT genes based only on the signal-prosegment region. The Bl tree is selected as the background tree. The major clades or clusters 
having similar topologies from all three tree-building methods (Bl, NJ and ML) are combined and labeled with the Bl posterior probabilities and the 
bootstrap support values from the NJ and ML analyses. Primate DEFA/DEFT genes are clustered into prosimian and simian clades. The simian clades 
are classified and named following the nomenclature used for the human DEFA/DEFT genes. The "P" in node labels denotes a pseudogene, and the 
"U" indicates a gene with an ambiguous locus from the species in the synteny map in Figure 7. B: Phylogenetic tree of primate and treeshrew DEFA/ 
DEFT genes based the entire coding region. The Bl tree is selected as the background tree. The treeshrew DEFA genes are the outgroups of the two 
separate clades. 

doi:1 0.1 371 /journal.pone.0097425.g001 



trees, the DEFA8 from Old World monkeys and hominoids is the 
outgroup in the DEFA8/DEFA10/DEFT cluster. Similarly, differ- 
ent patterns of sequence similarity were observed before and after 
a clear boundary in the genes of this cluster. Phylogenetic trees 
suggest that part I and part II of the DEFA8 from Old World 
monkeys and hominoids might be derived from different ancestors 
(Figure S8). The complicated phylogenetic relationships within 
both the DEFA9/DEFA11 cluster and the DEFA8/DEFA1 01 DEFT 
cluster imply that gene conversion or interlocus recombination 
might have occurred during the evolutionary process. 

Unlike the DEFA genes, which are present in all primates, 
DEFT genes are present in only Old World monkeys and certain 
hominoids. Both the tree inferred from the entire coding region 
and the tree inferred from the signal-prosegment region show that 
DEFT groups together with DEFA10. DEFA10, which is present 
only in hominoids, is the basal clade in the DEFA 1 0 /DEFT cluster 
(Figure 1). The relationship between DEFA10 and DEFT and the 
origin of DEFTwere further analyzed using their introns. The Bl/ 
NJ/ML trees from the introns support that DEFT and DEFA10 
are grouped into separate clusters, and both clusters are derived 
from the duplication of DEFA8 (Figure S9). 

Sequence divergence and structural variation of simian 
a-defensins 

After identifying different clusters of primate ot-/9-defensins, the 
pattern of natural selection operating on each cluster was 
examined using the site-specific models of heterogeneous selection 
pressure among sites. Likelihood ratio tests (LRTs) were carried 
out between models Mia (neutral) and M2a (selection), M7 (beta) 
and M8 (beta and G) (equivalent to Ka/Ks)), and M8a (beta and 
co=l) and M8 [53]. The models M2a and M8, which are 
significandy favored over the other models, indicate positive 
selection, and the positively selected sites with posterior probabil- 
ities greater than 0.95 are listed (Table 1 and S2). Positive selection 
is not significantly detected for every cluster. For the two 
prosimian clades and the simian DEFT, DEFA1 (including 
DEFA3), DEFA5 and DEFA8 (including DEFA10) clusters, 
positively selected sites are detected and are located primarily in 
the mature peptide. However, the simian DEFA4, DEFA6 and 
DEFA9 clusters are under purifying selection, and no positively 
selected site is detected. The different selection patterns among 
these clusters reflect their variant functional constraints. There- 
fore, the sequence divergence and structural variations among the 
six clusters (excluding the DEFA7 pseudogene cluster) were further 
investigated to understand their functional diversity, which most 
likely constitutes their antimicrobial spectra. 

During the evolutionary process following gene duplication, due 
to functional divergence, the evolutionary rate of duplicated genes 
will initially increase and may subsequently shift with altered 
functional constraints (type I functional divergence) or remain at 
the original rate with no altered functional constraints (type II 
functional divergence) [50,51,64]. Type I and type II functional 
divergence patterns among simian myeloid and enteric a-defensins 
were examined using the Gu99 and Gu2013 methods (Figure 2). 



Statistical analysis suggests that clusters of both simian myeloid a- 
defensins and simian enteric oc-defensins have significandy 
diverged (Figure 2A). The pairwise coefficients of type I functional 
divergence (Q~j are all significantly greater than 0 (/><0.05), except 
for the comparison between DEFA9 and DEFA6 (p = 0.07). The 
type I functional distance for each pair of clusters was estimated to 
measure whether one cluster has more shifted evolutionary rates 
than the other cluster following gene duplication [52]. The type I 
functional distance (rfp) ranges from 0.78 to 1.40 for the simian 
myeloid a-defensins (with the functional branch length of DEFAl, 
DEFA4 and DEFA8 being 0.94, 0.32 and 0.46, respectively) and 
from 0.69 to 3.93 for the simian enteric a-defensins (with the 
functional branch length of DEFA5, DEFA6 and DEFA9 being 
3.41, 0.53 and 0.17, respectively). Type I functional distances 
among enteric a-defensins are much higher than among myeloid 
ones, except for the distance between DEFA9 and DEFA6. The 
enteric a-defensins have a higher level of functional divergence, 
which reflects a more challenging microbial environment in the 
gut. Critical amino acid sites responsible for the type I and type II 
functional divergences (hereafter referred to as type I and type II 
sites, respectively) were detected (Figure 2B and C). For simian 
myeloid a-defensin clusters, the type I sites of DEFAl are located 
in the mature peptide, whereas the type I sites of DEFA8 and 
DEFA4 axe located in the signal-prosegment (Figure 2D). The 
different locations of the type I sites among simian myeloid a- 
defensin clusters suggest a specific altered functional constraint on 
the mature peptide of DEFAl. For simian enteric a-defensin 
clusters, the type I sites of DEFA5 are located in the signal- 
prosegment, whereas the type I sites of DEFA9 and DEFA6 are 
primarily located in the mature peptide. This result indicates a 
specific altered functional constraint on the signal-prosegment 
region of DEFA5. Therefore, different clusters of a-defensins have 
undergone various functional constraints and have likely been 
specialized in their antimicrobial spectra. No site corresponding to 
the type I or type II functional divergence was detected between 
simian myeloid a-defensins and simian enteric a-defensins because 
of the high level of divergence within both clades. 

The dimerization and the cationic and amphipathic properties 
of the a-defensin structures are important for their antimicrobial 
activities [2,33,35-37]; hence, these characters of the representa- 
tive structures from six clusters were analyzed. The a-defensin 
dimers were superimposed based on the backbone residues 18 and 
20, which form four pairs of intermolecular hydrogen bonds 
(Figure 3A and B). These superimposed dimers include the 
myeloid a-defensins DEFAl (human HNP1), DEFA4 (human 
HNP4) and DEFA8 (modeled rhesus macaque DEFA8) and the 
enteric a-defensins DEFA5 (human HD5), DEFA6 (human HD6) 
and DEFA9 (modeled marmoset DEFA9a). The [S-sheets overlap 
well, whereas the two loops connecting pi-fS2 strands and fS2-fi3 
strands show structural variation. Residues 1 1 and 22 represent 
the most distal amino acids on the two flexible loops. Thus, the 
distances of A22C a -B22C ot and Al lC a -Bl lC a , as well as the 
dihedral angle of Al lC ot -A22C ot -B22C a -Bl lC a , were monitored 
in molecular dynamics simulations (see Materials and Methods) to 
assess their structural variation (Figure 3B). The distance of 
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Table 1. Log-likelihood values and positive 


ly selected sites for primate DEFA/DEFT genes under site models. 






Cluster 


Models (2AlnL) 


Positively Selected Sites 


P value 


DEFT (n = 16) 


Mia versus M2a (12.57) 


671 69R 


<0.001 




M7 versus M8 (12.60) 


17Q 50D 671 69R 


0.002 




M8a versus M8 (12.58) 




<0.001 


DEFA1 (n = 16) 


Mia versus M2a (22.42) 


29V 67Y 741 


<0.001 




M7 versus M8 (23.33) 


29V 67Y 741 86Q 


<0.001 




M8a versus M8 (22.40) 




<0.001 


DEFA4 (n = 8) 


Mia versus M2a (3.02) 




0.082 




M7 versus M8 (3.32) 




0.190 




M8a versus M8 (3.02) 




0.082 


DEFA8 (n = 15) 


Mia versus M2a (119.67) 


65R 671 70R 71 G 721 75L 76L 79R 80Y 82S 84A 85F 87G 921 


<0.001 




M7 versus M8 (121.57) 


65R 671 70R 71 G 721 75L 76L 79R 80Y 82S 84A 85F 87G 921 


<0.001 




M8a versus M8 (119.66) 




<0.001 


DEFA5 (n = 13) 


Mia versus M2a (59.78) 


68T 70R 72A 73T 74R 77L 80V 82E 831 84S 


<0.001 




M7 versus M8 (63.85) 


25R 62A 68T 70R 72A 73T 74R 77L 80V 82E 831 84S 


<0.001 




M8a versus M8 (59.72) 




<0.001 


DEFA6 (n = 7) 


M1a versus M2a (3.33) 




0.068 




M7 versus M8 (3.35) 




0.187 




M8a versus M8 (3.32) 




0.068 


DEFA9 (n = 4) 


Mia versus M2a (2.51) 




0.113 




M7 versus M8 (2.52) 




0.284 




M8a versus M8 (2.50) 




0.114 


prosimian DEFA cladel 
(n = 6) 


Mia versus M2a (16.11) 


691 79R 83V 87R 


<0.001 




M7 versus M8 (16.84) 


66H 691 79R 83V 87R 


<0.001 




M8a versus M8 (16.32) 




<0.001 


prosimian DEFA clade2 
(n = 14) 


Mia versus M2a (76.36) 


66R 73G 79Y 87F 90L 


<0.001 




M7 versus M8 (79.40) 


37T 66R 71 R 73G 74F 78T 79Y 87F 90L 


<0.001 




M8a versus M8 (72.84) 




<0.001 


Notes: positively selected sites with significance at the 99% level are bold, whereas the remaining sites have significance at the 95% level. The likelihood ratio tests are 
analyzed by comparing the following pairs of models: M1a-M2a, M7-M8 and M8a-M8. 
doi:1 0.1 371 /journal.pone.0097425.t001 



A22C ot -B22C ct is almost equal among the six dimers (Figure 3C), 
suggesting that the dimers are not separated into monomers and 
remain stable during the 16-nanosecond molecular dynamics 
simulation. The distance of Al lC^-Bl 1C 01 ranges from 30 A to 
37 A, and the dihedral angle of Al lC 0[ -A22C a -B22C ot -Bl lC* 
ranges from 75° to 120° (Figure 3C and D), showing significant 
variations in these dimer structures and providing a structural 
foundation for their different antimicrobial activities. In addition 
to dimerization, the surface electrostatic potentials of simian 
enteric and myeloid a-defensins create distinct cationic and 
amphipathic patterns (Figure 3E), which likely affect their 
membrane depolarization and permeabilization abilities through 
electrostatic and hydrophobic interactions with microbial mem- 
branes. 

Sequence profiles and structural features of 0-defensins 

The 0-defensins are cyclic octadecapeptides formed by the 
head-to-tail ligation of two nonapeptides that can be homodimeric 
or heterodimeric to increase their diversity [62,65]. The non- 
apeptide motif of 9-defensins is presented as 
RC [TVLF] C [RTGVL] [RL] G \VFY\ C (Figure 4A), which shows 



both a conserved cysteine pattern and a conserved hydrophilic/ 
hydrophobic pattern. The conserved cysteines at positions 2, 4 and 
9 of the consensus motif form the tridisulfide ladder that has been 
proven to be important for structural stability but not for 
antimicrobial activity [66] . Positions 1 , 5 and 6 are mostiy cationic 
and hydrophilic arginine residues, whereas positions 3, 7 and 8 are 
mosdy hydrophobic residues. The side chains of arginine residues 
and the tridisulfide ladder are on opposite sides of the cyclic 
backbone plane and thus form a polarized structure. The 
conserved hydrophilic/hydrophobic pattern generates the cationic 
and amphipathic properties, which might play a major role in the 
membrane depolarization and permeabilization of target cells or 
enveloped viruses. The cationic and amphipathic properties of 0- 
defensins, like a-defensins, have been reported to be important for 
antimicrobial activities by disrupting microbial membrane struc- 
tures [2,33]. 

Sites 31 and 5R in the nonapeptides (corresponding to 671 and 
69R in the full sequence) were found to be under significant 
positive selection (Table 1). The surface electronic potentials for 
two types of 0-defensin homodimers, (RCICRRGVC) 2 that is 
under positive selection and (RCVCTRGVC) 2 that is not under 
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B 



Functional Divergence 




Type I 
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I , in,"\ f 1 l 1 tarn till 

Ucllc ^IUSLciS \IJ ) 




LRT 


p value 




ne 


6y±SE ne 


DEFA1, DEFA8 


0.74±0.15 


24.71 


<0.0001 


1.40 


8 


0.20±0.l7 l 


simian myeloid a-defensins DEFA1 , DEFA4 


0.72±0.29 


6.53 


0.01 


1.26 


2 


0.09±0.l5 


DEFA8, DEFA4 


0.54±0.17 


10.30 


0.001 


0.78 


3 


0.34±0.l6 4 


DEFA6, DEFA5 


0.98±0.25 


15.11 


<0.0001 


3.93 


7 


0.l3±0.l6 


simian enteric a-defensins DEFA9, DEFA5 


0.97±0.21 


22.22 


<0.0001 


3.57 


8 


0.22±0.l5 2 


DEFA9, DEFA6 


0.50±0.29 


3.22 


0.07 


0.69 


1 


0.20±0.ll 4 


Type I 


Type II C 








Type I 


Type II 



DEFAl 



DEFA8 



DEFA4 



position (k) 

hspiDEFAla 

hspi DEFAlb 

hspi_DEFA3 

ptroDEFAla 

ptro DEFAlb 

ptro DEFA3 

ggor DEFAl 

pabeDEFAla 

pabeDEFAlb 

pabeDEFAlc 

pabeDEFAld 

pabeDEFAlf 

pabeDEFAlg 

mmul_DEFA l 

cjac_DEFAl 

sbolDEFAl 

pabeDEFAlOb 

pabe DEFAl 0c 

nleu DEFAl 0a 

nleu_DEFAl0b 

ssynDEFAlO 

mmul_DEFA8 

mfas_DEFA8 

cjac_DEFA8a 

cjac_DEFA8b 

cjac_DEFA8c 

cjac_DEFA8d 

cjac_DEFA8e 

sbol_DEFA8a 

sbol_DEFA8b 

sbol_DEFA8c 

hspi_DEFA4 

ptro_DEFA4 

ggor_DEFA4 

pabe_DEFA4 

nleu_DEFA4 

mmul_DEFA4 

cjac_DEFA4 

sbol DEFA4 



67789 1333 
9481 157267 
I ARTFAAPAA 
I ARTFAAPAA 
I ARTFAAPAA 
I ART F A A P P A 
I ARTFAAP PA 
I ARTFAAP PA 
I ARTFAAP PA 
I ART FAAQ PA 
I ARTFAAP PA 
I ARTFAAQ PA 
I ART FAAQ PA 
I ART FA A P PA 
I ARTFAAQ PA 
I ART F VAQ P T 
I ART F I AKLA 
I ART F I AKSA 

RLRS I AAQGA 
RFHS I AAQGA 
RARTAAAQGA 
RART AAAQGA 
T LH S I AAQGA 
RPRTYAAQRA 
RPRTYAAQRA 
P SDALAAQGA 
LS LF VAAQGA 
LSDTLAAQGA 
RADALAAQGA 
RAQ I F AAQGA 
Q F P T F AAQGA 
F S E F C AAQGA 
VSQT YAAQGA 

LRRNYAVQGP 
LRRN YA VQGP 
LRRNYAVQGP 
LRRNYAVQGA 
LRRNYTAQGA 
LRRNYAVQGP 
LPRS YAAQGV 
LP S S YAAQGV 



2 3 3 5 5 
0 15 0 2 
EA I EL 
EA I EL 
EA I EL 
EA I EL 
EA I EL 
EA I KL 
EA I EL 
EA I EL 
EA I EL 
EA I KL 
EA I KL 
EA I EL 
EA I EL 
EA I EL 
GA I EL 
GA I EL 

EAPEA 
EAPEA 
EAPEA 
EATEA 
EATEA 
EAPEA 
EAPEA 
EAPEA 
EAPQA 
EAPEA 
EAPEA 
EAPEA 
EAPEA 
EAPEA 
EAPEA 

GGRKS 
GGRKS 
GGRKS 
GGPKS 
GGRKS 
GGRKS 
GGPKS 
GGHKS 



DEFA5 



DEFA6 



DEFA9 



position (k) 

hspi_DEFA5 

ptro_DEFA5 

ggor_DEFA5 

nleu_DEFA5 

mmul_DEFA5a 

mmul_DEFA5bU 

mmul_DEFA5cU 

mmul_DEFA5dU 

mmul_DEFA5eU 

mfas_DEFA5a 

panu DEFA5 

cjac_DEFA5 

sbol DEFA5 

hspi_DEFA6 

ptro_DEFA6 
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Figure 2. Type I and type II functional divergences among simian enteric or myeloid a-defensin clusters. A: Statistical analyses of the 
type I and type II functional divergences. Clusters within both simian enteric a-defensins and simian myeloid a-defensins have diverged. Pairwise 
coefficients (0,y±SE) and likelihood ratio statistics (LRT) are analyzed using the bootstrapped Gu99 method. Type I functional distance (d f ) between 
clusters is calculated as d f = — ln(1 — 8,j). The effective number of functional divergence related sites (n e ) is estimated following the Gu2013 method. B: 
Critical amino acid sites responsible for type I and type II functional divergences from simian myeloid a-defensin clusters. These sites are identified 
based on the n e cutoff after ranking posterior probabilities. C: Critical amino acid sites responsible for type I and type II functional divergences from 
simian enteric a-defensin clusters. D: The diagram of the sites responsible for type I and type II functional divergences in different simian clusters. 
Closed circles represent sites responsible for type I functional divergence, and opened circles denote sites responsible for type II functional 
divergence. 
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positive selection (Figure 4B), were analyzed based on molecular suggesting that the positively selected sites are related to the 
dynamics simulations. The surface electronic potentials exhibit antimicrobial function of 9-defensin. 
differences that may affect the polarization of 9-defensin, 
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Figure 3. Structural flexibility and surface electrostatic potential for the simian myeloid and enteric a-defensin dimers. A: 

Superimposed a-defensin dimers for the myeloid DEFA1, DEFA4 and DEFA8 proteins as well as the enteric DEFA5, DEFA6 and DEFA9 proteins. B: 
Diagram illustrating the intermolecular hydrogen bonds, the two distances (A22C-B22C 11 and A1 1 C a -B1 1C™) and the dihedral angle (A1 1 C-A22C- 
B22C a -B1 1 C). C: Plots of the A22C°-B22C st and A1 1 C a -B1 1 C a distances during the 1 6 nanoseconds of molecular dynamics simulations (upper panel) 
and plots of the average A22C a -B22C c ' and A1 1 C a -B1 1 C distances (lower panel). Similar A22C 3, -B22C IX distances indicate that the dimer structures are 
well maintained during simulation processes, whereas diverse A1 1C"-B1 1C™ distances reflect the flexibility of the dimers. Error bars represent 
standard deviations. D: Plots of the dihedral angle A1 1C a -A22C"-B22C a -B1 1C a . The variant dihedral angles among different clusters of myeloid and 
enteric a-defensins indicate different dimer topologies. E: Surface electrostatic potential generated using the smoothed trajectory from the last five 
frames of molecular dynamics simulations. The electrostatic potential (±10 kT/e) is colored red (-) or blue (+). The left and right views of each 
structure are the same as in panel A. 
doi:1 0.1 371 /journal.pone.0097425.g003 



Adaptive changes in the prosegment coevolving with 
the new protein fold of the mature peptide 

The 0-defensins are the youngest members of the defensin 
family. A premature stop codon mutation in the mature peptide 
gives birth to the 9-defensin, which contains a new protein fold 
that is different from that of a-defensins. The selection patterns 
along the length of DEFA and DEFT were examined to check 
whether there were any adaptive changes in the signal-prosegment 
region that are related to the new protein fold. For this purpose, a 
sliding window method for the amino acid identity and the Ka/Ks 
was used to analyze the selection pressures on different regions of 
all DEFA, DEFT and separate DEFA clusters (Figure 5, Table S3). 
Furthermore, bootstrap tests for the Ka/Ks were performed to 
assess the effect of potential biases from a few genes in the gene set 
and to test the significance of the difference between the average 
Ka/Ks ratio and 1. 

The amino acid identities decrease from the signal peptide and 
prosegment to the mature peptide for both DEFA and DEFT 
(Figure 5A and B). The signal peptide is the most conserved 
region, whereas the mature peptide is highly divergent. This result 
also supported our rationale for inferring a phylogenetic tree using 
the more conserved signal-prosegment region (Figure 1A). 

The Ka/Ks ratios are not homogeneous among the sliding 
windows for DEFA, DEFT and separate DEFA clusters, indicating 
that varying selection pressures act on different fragments. The 
signal peptides are tested to be under purifying selection (Ka/Ks< 
1) for both DEFA and DEFT. When the DEFA clusters are 



analyzed separately, only the DEFA5 cluster (sites 6-15) and the 
prosimian DEFA clade 1 (sites 1-10) are detected with positive 
selection in the signal peptides. However, the mature peptides are 
under positive selection (Ka/Ks>l) for both DEFA and DEFT. 
For separate simian DEFA clusters, the mature peptides of DEFA5 
and DEFA8 are under strong positive selection. The mature 
peptides of the prosimian DEFA clade 1 and clade 2 are also 
detected with positive selection. The positive selection acting on 
the mature peptides is likely due to the strong selection pressure 
through their direct interactions with variable microbes. 

The prosegments of the DEFA and the DEFT are under 
different selection pressures. For DEFA, the fragment 56-65 in the 
prosegment is under positive selection (Ka/Ks = 1 .46, bootstrap 
test Ka/Ks significantly >1, P<0.01) (Figure 5A), whereas the 
same fragment from separate simian DEFA clusters (Figure 5C) is 
not detected with positive selection. The fragment 56-65 is a 
crucial region that harbors several posttranslational cleavage sites 
[31,67,68]. The positive selection acting on this cleavage region of 
DEFA may be explained by the diverse splicing mechanisms 
among different DEFA clusters, which are conserved within each 
cluster. The prosimian clade 1 and clade 2 are also detected to be 
under positive selection in the cleavage region, indicating that both 
clades contain divergent oc-defensins. 

For DEFT, the fragment 41-50 in the prosegment is under 
strong positive selection (Ka/Ks = 1.68, Ka/Ks significantly >1, 
P<0.05), compared to the corresponding fragment of DEFA (Ka/ 
Ks significantly <1, P<0.01) and the closely related DEFA8 (Ka/ 
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Figure 4. Sequence motifs and electrostatic features of primate 
cyclic 6-defensins. A: The 6-defensin octadecapeptide is formed by the 
head-to-tail ligation of two nonapeptides. The sequence logo for the 
nonapeptide is generated using all the primate 8-defensins. Sites 31 and 
5R are under positive selection and are marked with asterisks. B: Surface 
electrostatic potential of the O-defensins without or with the positively 
selected sites (31 and 5R). 
doi:1 0.1 371 /journal.pone.0097425.g004 

Ks = 1.11, Ka/Ks not significantly >1, P = 0.16). However, the 
prosegment of DEFA is detected with positive selection in another 
region, namely, the fragment 26-35 (Ka/Ks =1.26, Ka/Ks 
significantly >1, P<0.01). The distinct positive selection in the 
prosegments of DEFA and DEFT indicates a specific change in the 
prosegment of DEFT. Because the prosegment is important for the 
correct folding of defensins [29], it is most likely that the 
prosegment of 8-defensins has undergone adaptive changes that 
are related to the new protein fold. The diagram in Figure 6 
presents our proposed hypothesis for this evolutionary pattern and 
illustrates the adaptive evolution of the prosegment along with the 
new protein fold of the mature peptide. 

Genetic hitchhiking of the pseudogene DEFTP and loss of 
functional 9-defensin in humans and chimpanzees 

To understand the evolutionary dynamics of the ot-/9-defensin 
multigene family, the comparative synteny map from five simian 
species, including humans, chimpanzees, orangutans, macaques 
and marmosets, was analyzed and was combined with the 
phylogenetic results. The primate DEFA/DEFT genes originate 
from tandem duplication and are located in the regions 
homologous to the human chromosome 8p23 (Figure 7A). 
Conserved synteny relationships are observed among the five 
simian species. The DEFA5 orthologs are located at the 5' 
terminus, and the DEFA9, DEFA8, DEFA4 and DEFA6 orthologs 
are located at the 3' terminus in a 5' to 3' order. The simian 
myeloid and enteric DEFA genes, which are clustered into distinct 
clades in both phylogenetic trees (Figure 1), are not located as two 



clearly separated blocks in the synteny map. Loci of the genes from 
the two major clades (simian enteric and myeloid defensins) are 
intermingled, implying that duplication of a segment containing 
two or more genes has occurred. Orthologs from some species 
have mutated into pseudogenes, which include DEFA6P in the 
marmoset; DEFA9P in the macaque, orangutan, chimpanzee and 
human; and DEFA8P in the orangutan, chimpanzee and human. 
In the phylogenetic trees, genes from the same species group 
together in each simian cluster (Figure 1), indicating species- 
specific duplication, which is a common phenomenon during the 
birth-and-death evolution of multigene families. In the synteny 
map (Figure 7A), species-specific duplication is not homogeneous 
across this region, with the middle section having more species- 
specific duplications. Based on the current versions of assembled 
genomes, the marmoset has multiple duplications of DEFA8 and 
DEFA9, and the macaque has multiple duplications of DEFT. In 
the orangutan, chimpanzee and human, there are several species- 
specific duplications of DEFA1 and DEFA10/DEFT. 

A nonsense mutation at codon 1 7 of DEFT results in its loss of 
function in orangutans, chimpanzees and humans (Figure 7B), and 
these DEFTP pseudogenes contain the hspi_DEFTlP, hspi_- 
DEFT2P, ptro_DEFTlP, ptro_DEFT2P and pabe_DEFT4P (exclud- 
ing the pabe_DEFT2P pseudogene which is generated by frame- 
shift mutations). In humans, chimpanzees and gorillas, the DEFT 
has been lost, and only DEFTP pseudogenes have been 
maintained. Our results demonstrate that 0-defensin had a 
conserved structural motif and was under positive selection. 
Therefore, the driving force for the expansion of the pseudogene 
DEFTP and for the loss of functionally important 8-defensins in 
humans and chimpanzees were investigated. A less-is-hitchhiking 
hypothesis was proposed based on the synteny map and 
phylogenetic relationships. The synteny map suggests that the 
loci of the DEFA1 and the DEFT/DEFA10 are linked and have 
been duplicated together in orangutans, chimpanzees and humans 
(Figure 7A). The genes of the DEFA1 cluster are all functional and 
are under positive selection; thus, we infer that each box in the 
synteny map contains a driver from the DEFA1 cluster and a 
hitchhiker from the DEFT/DEFA10 cluster. The orangutan has 
multiple copies of functional DEFT as well as the pseudogene 
pabe_DEFT4P containing the nonsense mutation at site 1 7 
(Figure 7A and B). This result suggests that, in the ancestor of 
the orangutan, chimpanzee and human, the selection pressure on 
the multiple-copied DEFT was relaxed and the pseudogene 
DEFTP (containing the nonsense mutation at site 17) emerged. 
Both DEFT and DEFTP are linked to DEFA1 and can expand, but 
only DEFTP (containing the nonsense mutation at site 17) is 
maintained, whereas DEFT 'is lost in the chimpanzees and humans 
(Figure 7A). The retention of DEFTP and loss of DEFT might be 
independent of their own fitness and depends on the fate of the 
driver. The pabe_DEFT4P is linked to the pabe_DEFAlg, which 
harbors a significantly positively selected site whose expansion 
might be advantageous: 741, adjacent to the third cysteine. This 
741 is not present in other pabe_DEFAl [a, b, c, d and J) sequences 
(Table 1, Figure SI). Therefore, the DEFTP is linked to a DEFA1, 
whose expansion might be advantageous. 

The less-is-hitchhiking hypothesis depicts the evolutionary 
process of the expansion of the pseudogene DEFTP and the loss 
of functional 6-defensin in the humans and chimpanzees (Figure 8). 
With the expansion of the gene (driver) under positive selection, 
the adjacent linked gene (hitchhiker) is duplicated in the process of 
segmental duplication. The pseudogenization of the hitchhiker 
occurs in an ancestral species following the duplication process. 
During the birth-and-death evolution, the hitchhiker that gains 
more copies may be either the pseudogene or the functional gene, 
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Figure 5. Sliding window analyses of the Ka/Ks for all DEFA, DEFTand separate DEFA clusters. A: Sliding window analysis of the amino acid 
identity and Ka/Ks for all DEFA. The sequence conservation decreases from the signal peptide and prosegment to the mature peptide. The selection 
pressures acting on different windows are different as indicated by the Ka/Ks ratios. Fragment 26-35 in the prosegment region, fragment 56-65 in 
the cleavage region and most fragments in the mature peptide are under positive selection (Ka/Ks>1). The significance of Ka/Ks>1 is tested by a 
bootstrap method and is indicated by * (P<0.05) or ** (P<0.01). B: Sliding window analysis of the amino acid identity and Ka/Ks for DEFT. The 
sequence conservation also decreases from the signal peptide and prosegment to the mature peptide. The selection pressure on different regions is 
also variable. Compared to the same region of all DEFA and that of DEFA8, fragment 41-50 in the prosegment of DEFT is under strong positive 
selection. C: Sliding window analysis of Ka/Ks for each simian DEFA cluster. Only the DEFA5 and DEFA8 clusters contain fragments that are under 
positive selection. D: Sliding window analysis of Ka/Ks for the two prosimian clades. The fragment 56-65 in the cleavage region is also under positive 
selection for both prosimian clades, consistent with that of all DEFA. 
doi:1 0.1 371 /journal.pone.0097425.g005 
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Figure 6. Coevolution of the prosegment with the new protein 
fold of the mature peptide. The ancestral protein fold refers to the 
a-defensin mature peptide, and the new protein fold refers to the 9- 
defensin mature peptide. Because the prosegment is important for the 
correct folding of the mature peptide, the prosegment coevolves with 
the mature peptide through adaptive changes. 
doi:1 0.1 371 /journal.pone.0097425.g006 

which is determined by the fate of its adjacent driver. The former 
scenario is the less-is-hitchhiking hypothesis. The less-is-more, less- 
is-less and less-is-nothing hypotheses depend on the phenotypic 
change of the loss-of-function variant, which can be advantageous, 
deleterious or tolerated, respectively. However, the less-is-hitch- 
hiking hypothesis provides another explanation for the driving 
force of the loss-of-function variant expansion that does not 
involve phenotypic consequences. The fate of the loss-of-function 
hitchhiker is determined by the surrounding genomic environment 
of a driver gene (i.e., an adjacent gene under positive selection). 
The loss-of-function hitchhiker can be advantageous, deleterious 
or tolerated. 

Discussion 

In this study, we collected the primate DEFA/DEFT gene 
repertoires and performed extensive analyses of the primate ot-/9- 
defensin multigene family. Through systematic phylogenetic 
analyses, a detailed classification and nomenclature of the primate 
ot-/6-defensins was provided. The classification of simian a-/0- 
defensins based on phylogenetic trees is related to their expression 
patterns in myeloid and enteric tissues. Further phylogenetic 
classification of simian oc-defensins into six functional gene clusters 
corresponds to their functional divergence. In a previous study, 
this multigene family was classified into three classes based on 
neighbor-joining (NJ) and maximum parsimony (MP) trees using 
genes from simians that included human, chimpanzee, orangutan, 
macaque and marmoset sequences as well as one mouse ac- 
defensin sequence as the outgroup [42] . A recent study focused on 
the phylogenetic relationships of the a-, ft- and 8-defensins using 
DEFA/DEFT sequences from 10 primate species and two human 
DEFB sequences as the outgroup based on the NJ method [69] . In 
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Figure 7. Hitchhiking of DEFT/DEFTP during the birth-and-death 
evolution of the primate DEFA/DEFT multigene family. A: 

Comparative synteny map of the DEFA/DEFT gene loci in humans, 
chimpanzees, orangutans, macaques and marmosets. The boxes 
highlight the two genes that are linked and duplicate together, 
including a driver from the DEFA1 cluster and a hitchhiker from the 
DEFT/DEFA10 cluster. Arrowheads indicate transcriptional orientation. 
Pseudogenes are in white, and functional genes are in black. The 
dashed-line box includes the pseudogene DEFTP containing the 
nonsense mutation at site 17. B: The NJ tree based on the introns of 
the boxed DEFT/DEFTP from humans (hspi), chimpanzees (pfro), 
orangutans (pabe) and macaques (mmul). The nonsense mutations (*) 
at codon site 17 of the pseudogenes hspi_DEFT1P, hspi_DEFT2P 
ptro_DEFT1P, ptro_DEFT2P and pabe_DEFT4P are underlined with 
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dashed lines, suggesting that these DEFTP pseudogenes are derived 

from a common ancestor. 

doi:1 0.1 371 /journal.pone.0097425.g007 



our study, we identified 144 primate DEFA/ DEFT sequences from 
1 6 species of simians and prosimians as well as 1 0 DEFA sequences 
from two treeshrews, which are closely related to primates. We 
used sequence alignments of both the more conserved signal- 
prosegment region and the entire coding region to more precisely 
reconstruct a-/6-defensin orthologous relationships and to uncov- 
er their potential functional divergence. 

The primate DEFA I DEFT sequences that are used for this study 
were collected from the public database, which may contain errors 
or uncertainty of the sequencing and assembly. One of such 
ambiguous sequences might be the pabe_DEFA10c sequence, which 
is highly similar to the pabe_DEFT sequences and is phylogenet- 
ically far from the DEFA10 cluster (Figure 1A and S8). This 
sequence is named DEFA10 only because site 77 is not a 
premature stop codon. Thus, pabe_DEFA10c was excluded in the 
phylogenetic analysis based on the introns from DEFAS / DEFA1 0 / 
DEFT (Figure S9). For the analysis of positive selection using 
PAML, sequencing and assembly errors may cause bias. Our 
results of PAML analysis for separate clusters show that most of 
the positively selected sites are located in the mature peptide, 
similar to the results of a previous study that analyzed all the DEFA 
sequences together [42] . For the sliding window analysis of Ka/ 
Ks, we used a bootstrap method to eliminate the effect of potential 
bias from the sequencing and assembly error in a few genes. For 
the synteny analysis, there are sequences [cjac_DEFA9cU and 
cjac_DEFA9dU) that are not mapped onto the chromosomes of the 
marmoset. In macaques, several sequences {mmul_7bUP, mmul_- 
DEFASbU, mmul_DEFA5cU, mmul_DEFA5dU and mmul_DEFA5eU) 
are mapped onto a chromosome region that is quite far from the 
defensin gene locus presented in the synteny map. The human, 
chimpanzee, orangutan, macaque and marmoset genomes were 
selected for the synteny analysis because of their more complete 
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Figure 8. The less-is-hitchhiking hypothesis during the birth- 
and-death evolution of multigene families. With the expansion of 
the gene (driver) under positive selection, the adjacent linked gene 
(hitchhiker) is duplicated in the process of segmental duplication. 
Pseudogenization of the hitchhiker occurs in an ancestral species 
following the duplication process. During the process of birth-and- 
death evolution, the hitchhiker that gains more copies can be either the 
pseudogene or the functional gene with the other one being lost, and 
the gain or loss of the hitchhiker is determined by the fitness of its 
adjacent driver. The expansion of the pseudogene and loss of the 
functional gene scenario is defined as the less-is-hitchhiking hypothesis. 
doi:1 0.1 371 /journal.pone.0097425.g008 



assemblies in the genomic region harboring the DEFA1 and 
DEFAS /DEFA10/ DEFT gene cluster, whereas the gorilla and 
gibbon genomes were not used due to the poor assembly in this 
region. 

The functional shifting of simian oc-defensins, which is implied 
by their sequence divergence, structural variation and selection 
pattern differences among the six functional gene clusters, most 
likely corresponds to differentiated antimicrobial activity. Howev- 
er, orthologs within the same gene cluster from different species 
are still under functional constraints, most likely due to their 
similar antimicrobial mechanisms and challenges from similar 
types of pathogens in closely related species. Based on these results, 
we speculate that these defensin clusters constitute a wide range of 
antimicrobial spectra and that each cluster occupies one part of 
the spectra. Within each cluster, there is species-specific cluster 
duplication or pseudogenization following the species diversifica- 
tions of simians, which implies that the antimicrobial spectra are 
continuously changing. At the same time, deconstructing the 
undocumented antimicrobial spectra of a-/9-defensins is not 
feasible, even for the most closely related species, because of the 
countless types of pathogens and the frequent duplication and 
divergence of defensins. Many experimental studies have shown 
that members of this family have different antimicrobial activities 
and are effective against different groups of microbes. For 
example, the antibacterial activity and specificity of the six human 
ot-defensins against Gram-positive and Gram-negative bacteria are 
different [70]. For human myeloid defensins, HNP1-3 defensins 
are of greater potencies than HNP4 against Gram-positive 
bacteria such as S. aureus, whereas HNP4 is of greater potency 
against Gram-negative bacteria such as E. coli and E. aerogenes. For 
human enteric defensins, HD5 is effective against both Gram- 
negative and Gram-positive bacteria, whereas HD6 shows 
significandy lower antimicrobial activity against Gram-negative 
and Gram-positive bacteria than HD5. The myeloid ot-/9- 
defensins and the enteric HD5 have also been reported to have 
inhibitory potential against enveloped and non-enveloped viruses, 
such as the herpes simplex virus [71], the human immunodefi- 
ciency virus [72,73], papillomaviruses [74], adenoviruses [75], 
polyomaviruses [76], the SARS coronavirus [77], influenza viruses 
[78,79] and others. In addition to its role in host defense, HD5 also 
has a homeostatic role in establishing and maintaining the 
intestinal microbiota [80]. Overall, these experimental studies 
indicate that the variable antimicrobial abilities of mammalian a-/ 
9-defensins compose an antimicrobial spectrum. Defensins with 
continuously shifting antimicrobial activity, together with other 
versatile antimicrobial factors, evolve to act as the first line of 
defense of the innate immune system and play important roles in 
the early host defense. Because of the therapeutic potential of 
antimicrobial peptides, synthetic defensins are being developed as 
peptide drugs, such as the fusion inhibitors. Exploring the natural 
defensin repertories, especially in species closely related to 
humans, will help us understand the stability and plasticity of 
ot-/0-defensins and aid in designing potent peptide drugs. 

The sliding window analysis of DEFT (Figure 5B) suggests that 
fragment 41-50 in the prosegment and fragment 66-75 in the 
mature peptide are under positive selection. These two fragments 
contain the positively selected sites detected by PAML (50D, 671 
and 69R, corresponding to the sliding window sites 49, 66 and 68, 
respectively). Other amino acid changes that may contribute to the 
high Ka/Ks value are 41G/E, 45S/A and 48R/W in fragment 
41-50 as well as 69R/L, 71V/I/F and 73R/Q> fragment 66-75 
(Figure SI). To be spliced into an 18-residue mature 9-defensin, 
the three C -terminal residues must be removed during the 
maturation process [18]. The three C-terminal residues were not 
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found to be under positive selection using the PAML software. 
However, the three C -terminal residues of 9-defensin show a 
pattern of RLL or QLL that is quite different from the ot-defensins 
(Figure SI), which might be related to their involvement in the 
maturation process of 9-defensins. 

Multigene families play important roles in the immune system, 
the sensory system, development and other processes. These 
families can express gene products at high levels, such as highly 
conserved histones and nuclear ribosomal RNAs, and produce 
proteins with diverse functional spectra, such as ot-defensins, major 
histocompatibility complex proteins, immunoglobulins, chemore- 
ceptors and olfactory receptors. The evolutionary processes of 
multigene families have been explained by birth-and-death 
evolutionary models [43]. Frequent gene conversion, interlocus 
recombination, gene duplication and pseudogenization are 
involved in the evolutionary processes of multigene families. 
Strong purifying selection or positive selection can also act on 
multigene families to conserve gene function or give rise to new 
adaptive phenotypes. The expansion of genes under selection 
through segmental duplications may impact the fate of the 
adjacent genes linked with them, which can also affect the 
evolution of multigene families. Our less-is-hitchhiking hypothesis 
depicts this phenomenon that the retention of pseudogenes and 
the loss of functional gene are determined by the fate of the 
adjacent gene during the birth-and-death evolutionary process. 
We believe that the expansion of the DEFTP pseudogene and the 
loss of functional 6-defensins in humans and chimpanzees is a 
representative case for the less-is-hitchhiking hypothesis based on 
the following criteria. First, the driver and the hitchhiker belong to 
phylogenetically separate clusters. Second, the driver and the 
hitchhiker are genetically linked and duplicate together on the 
chromosome. Third, positive selection acts on the functional 
drivers. Lastiy, the pseudogene hitchhiker expands and the 
functional hitchhiker has been lost, as determined by the fitness 
of the driver. 

Recendy, it has been demonstrated that genetic hitchhiking is 
pervasive and the mutational cohort that includes both the driver 
and the hitchhikers drives adaptation [81]. Although these 
hitchhikers mostly refer to point mutations, there are few reports 
of an entire gene being a hitchhiker, with one exception reported 
for the yellow monkey flower (Mimulus guttatus), in which a copper 
tolerance locus under selection and its tightly linked hybrid 
incompatibility locus spread to fixation in a copper mine 
population by genetic hitchhiking [82,83]. Future research on 
the evolution of genetic hitchhiking involving two or more closely 
linked genes from both case studies and whole-genome compar- 
isons will help to uncover the adaptation of complex traits from 
linked genes and to understand the genetic and evolutionary basis 
of certain disease-related traits during the hitchhiking processes. 

Supporting Information 

Figure SI Multiple sequence alignment of the primate 
a-/9-defensins. Most of the protein sequences are computa- 
tionally translated from genome sequences rather than being 
demonstrated to be produced in vivo. The mature peptide of ot- 
defensins has a motif with six conserved cysteine residues, 
represented as C-x-C-x(3,4)-C-x(9)-C-x(6,9)-C. A nonsense muta- 
tion at position 77 (Q77*) generates a 9-defensin precursor with 
three conserved cysteines. The mature peptide of 6-defensins has a 
nonapeptide motif with three conserved cysteine residues, 
represented as x-C-x-C-x(4)-C, with the exception of the Eastern 
black-and-white colobus 9-defensin (cgue_DEFT), which has the 
motif x-C-x-C-x(8)-C. The nonsense mutation at position 17 



(Ql 7*) results in the loss of the downstream protein-coding ability 
of the 6-defensins. The initial methionines (M) and conserved 
cysteines (C) in the alignment are highlighted in blue. The 
translational ends (*) of the 9-defensin are highlighted in magenta. 
The blocks highlight the sliding windows of fragments 41-50 and 
65-75 of DEFT, which are detected with Ka/Ks >1 (P<0.05) in 
Figure 5. The question marks indicate ambiguous amino acids 
because of the single nucleotide polymorphisms from the genome 
sequences. The genus and species for each abbreviation can be 
found in Table SI. 
(PDF) 

Figure S2 Phylogenetic trees of primate and treeshrew 
DEFA/DEFT genes based on the signal-prosegment 
region. The trees are built using the (A) Bayesian inference 
(BI), (B) neighbor-joining (NJ) and (C) maximum likelihood (ML) 
methods. The BI tree is labeled with posterior probabilities. The 
NJ and ML trees are labeled with bootstrap support values. All 
three trees are drawn to scale, with branch lengths proportional to 
the estimated evolutionary distances. "P" in node labels denotes a 
pseudogene. 
(PDF) 

Figure S3 Phylogenetic tree of primate and treeshrew 
DEFA/DEFT genes based on the signal-prosegment 
region. The BI tree is selected as the background tree. The 
major clades or clusters having similar topologies from all three 
tree-building methods (BI, NJ and ML) are combined and labeled 
with the BI posterior probabilities and the bootstrap support 
values from the NJ and ML analyses. 
(PDF) 

Figure S4 Phylogenetic trees of primate and treeshrew 
DEFA/DEFT genes based on the entire coding region. 

The trees are built using the BI (A), NJ (B) and ML (C) methods. 
The BI tree is labeled with posterior probabilities. The NJ and ML 
trees are labeled with bootstrap support values. All three trees are 
drawn to scale, with branch lengths proportional to the estimated 
evolutionary distances. 
(PDF) 

Figure S5 Phylogenetic tree of primate and treeshrew 
DEFA/DEFT genes based on the entire coding region. 

The BI tree is selected as the background tree. The major clades or 
clusters having similar topologies from all three tree-building 
methods (BI, NJ and ML) are combined and labeled with the BI 
posterior probabilities and the bootstrap support values from the 
NJ and ML analyses. 
(PDF) 

Figure S6 Phylogenetic incongruence is caused by the 
long-branch attraction of homogeneity sites. The homo- 
geneity sites under convergent or parallel evolution in the mature 
peptide (62, 63, 64, 65, 68, 71, 77, 81, 88, 89, 92 and 93) are 
highlighted in different colors. These homogeneity sites can cause 
phylogenetic incongruence between the trees constructed using the 
entire coding region versus the signal-prosegment region. The 
phylogenetic tree on the left is inferred based on the amino acid 
sequences of the entire coding region using NJ method without 
removing the homogeneity sites, whereas the tree on the right is 
built using NJ method after removing the homogeneity sites. 
When the long-branch attraction effect is eliminated, the 
sequences of the prosimian DEFA clade 2 group together. 
(PDF) 

Figure S7 The different phylogenetic relationships of 
DEFA9/DEFA11 are determined by different parts of the 
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gene. A: Different patterns of sequence similarity are observed 
before and after a clear boundary in the DEFA9/DEFA11 genes, 
namely part I and part II. Phylogenetic trees are separately 
inferred from part I (the tree on the left) and part II (the tree on the 
right), using the DEFA6 cluster as the outgroup. The combined 
tree based on part I shows that DEFA11 is duplicated from DEFA9 
after the split of New World and Old World monkeys. Whereas, in 
the combined tree of part II, DEFA11 is the outgroup of all DEFA9 
sequences. The BI tree is computed using the GTR + G + I (4 
categories) model. The NJ tree is computed using the K2 + G 
(shape parameter = 1.8 for part I and 3.3 for part II) model. The 
ML tree is computed using the K2 + G (4 categories) model. B: 
The BI/NJ/ML trees with branch lengths proportional to the 
estimated distances inferred from part I. C: The BI/NJ/ML trees 
inferred from part II. 
(PDF) 

Figure S8 The different phylogenetic positions of hom- 
inoid DEFA8 are determined by different parts of the 
gene. A: Different patterns of sequence similarity are also 
observed before and after a clear boundary in the DEFA8/ 
DEFA 10/ DEFT genes, which is different from that of DEFA9I 
DEFA11 genes. Similarly, phylogenetic trees are separately 
inferred from part I (the tree on the left) and part II (the tree on 
the right), using the DEFA1 cluster as the outgroup. In the 
combined tree inferred from part I, the DEFA8 from Old World 
monkeys and hominoids clusters together with DEFA10/DEFT. 
Whereas in the tree inferred from part II, the DEFA8 from Old 
World monkeys and hominoids is the outgroup in the DEFA8/ 
DEFA10 /DEFT cluster. The BI tree is computed using the GTR + 
G + I (4 categories) model. The NJ tree is computed using the K2 
+ G (shape parameter = 1.5 for part I and 2.0 for part II) model. 
The ML tree is computed using the K2 + G (4 categories) model. 
The sequence pabe_DEFA10c is clustered with DEFT, likely 
resulting from sequence assembly error. Thus, the sequence 
pabe_DEFA10c was excluded in following analysis in Figure S9. B: 
The BI/NJ/ML trees with branch lengths proportional to the 
estimated distances inferred from part I. C: The BI/NJ/ML trees 
inferred from part II. 
(PDF) 
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